11 research outputs found

    Outlier Detection in Heterogeneous Datasets using Automatic Tuple Expansion

    Get PDF
    Rapidly developing areas of information technology are generating massive amounts of data. Human errors, sensor failures, and other unforeseen circumstances unfortunately tend to undermine the quality and consistency of these datasets by introducing outliers -- data points that exhibit surprising behavior when compared to the rest of the data. Characterizing, locating, and in some cases eliminating these outliers offers interesting insight about the data under scrutiny and reinforces the confidence that one may have in conclusions drawn from otherwise noisy datasets. In this paper, we describe a tuple expansion procedure which reconstructs rich information from semantically poor SQL data types such as strings, integers, and floating point numbers. We then use this procedure as the foundation of a new user-guided outlier detection framework, dBoost, which relies on inference and statistical modeling of heterogeneous data to flag suspicious fields in database tuples. We show that this novel approach achieves good classification performance, both in traditional numerical datasets and in highly non-numerical contexts such as mostly textual datasets. Our implementation is publicly available, under version 3 of the GNU General Public License

    Meta-F*: Proof Automation with SMT, Tactics, and Metaprograms

    Full text link
    We introduce Meta-F*, a tactics and metaprogramming framework for the F* program verifier. The main novelty of Meta-F* is allowing the use of tactics and metaprogramming to discharge assertions not solvable by SMT, or to just simplify them into well-behaved SMT fragments. Plus, Meta-F* can be used to generate verified code automatically. Meta-F* is implemented as an F* effect, which, given the powerful effect system of F*, heavily increases code reuse and even enables the lightweight verification of metaprograms. Metaprograms can be either interpreted, or compiled to efficient native code that can be dynamically loaded into the F* type-checker and can interoperate with interpreted code. Evaluation on realistic case studies shows that Meta-F* provides substantial gains in proof development, efficiency, and robustness.Comment: Full version of ESOP'19 pape

    Compilation using correct-by-construction program synthesis

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages [131]-134).Extracting and compiling certified programs may introduce bugs in otherwise proven-correct code, reducing the extent of the guarantees that proof assistants and correct-by-construction program-derivation frameworks provide. We explore a novel approach to extracting and compiling embedded domain-specific languages developed in a proof assistant (Coq), showing how it allows us to extend correctness guarantees all the way down to a verification-aware assembly language. Our core idea is to phrase compilation of shallowly embedded programs to a lower-level deeply embedded language as a synthesis problem, solved using simple proof-search techniques. This technique is extensible (support for individual language constructs is provided by a user-extensible database of compilation tactics and lemmas) and allows the source programs to depend on axiomatically specified methods of externally implemented data structures, delaying linking to the assembly stage. Composed with the Fiat and Bedrock frameworks, our new method provides the first proof-generating automatic translation from SQL-style relational programs into executable assembly code.by Clément Pit-Claudel.S.M

    Relational compilation: Functional-to-imperative code generation for performance-critical applications

    No full text
    Purely functional programs verified using interactive theorem provers typically need to be translated to run: either by extracting them to a similar language (like Coq to OCaml) or by proving them equivalent to deeply embedded implementations (like C programs). Traditionally, the first approach is automated but produces unverified programs with average performance, and the second approach is manual but produces verified, high-performance programs. This thesis shows how to recast program extraction as a proof-search problem to automatically derive correct-by-construction, high-performance code from shallowly embedded functional programs. It introduces a unifying framework, relational compilation, to capture and extend recent developments in program extraction, with a focus on modularity and sound extensibility. To demonstrate the value of this approach, it then presents Rupicola, a relational compiler-construction toolkit designed to extract fast, verified, idiomatic low-level code from annotated functional models. The originality of this approach lies in its combination of foundational proofs, extensibility, and performance, backed by an unconventional take on compiler extensions: unlike traditional compilers, Rupicola generates good code not because of clever built-in optimizations, but because it allows expert users to plug in domain- and sometimes program-specific extensions that allow them to generate exactly the low-level code that they want. This thesis demonstrates the benefits of this approach through case studies and performance benchmarks that highlight how easy Rupicola makes it to create domain-specific compilers that generate code with performance comparable to that of handwritten C programs.Ph.D

    Trigger Selection Strategies to Stabilize Program Verifiers

    No full text
    SMT-based program verifiers often suffer from the so-called butterfly effect, in which minor modifications to the program source cause significant instabilities in verification times, which in turn may lead to spurious verification failures and a degraded user experience. This paper identifies matching loops (ill-behaved quantifiers causing an SMT solver to repeatedly instantiate a small set of quantified formulas) as a significant contributor to these instabilities, and describes some techniques to detect and prevent them. At their core, the contributed techniques move the trigger selection logic away from the SMT solver and into the high-level verifier: this move allows authors of verifiers to annotate, rewrite, and analyze user-written quantifiers to improve the solver’s performance, using information that is easily available at the source level but would be hard to extract from the heavily encoded terms that the solver works with. The paper demonstrates three core techniques (quantifier splitting, trigger sharing, and matching loop detection) by extending the Dafny verifier with its own trigger selection routine, and demonstrates significant predictability and performance gains on both Dafny’s test suite and large verification efforts using Dafny

    Coq library for formal topology

    No full text
    Version of the Coq formal topology library at the time of submission of my Master's thesis

    The essence of Bluespec: a core language for rule-based hardware design

    No full text
    The Bluespec hardware-description language presents a significantly higher-level view than hardware engineers are used to, exposing a simpler concurrency model that promotes formal proof, without compromising on performance of compiled circuits. Unfortunately, the cost model of Bluespec has been unclear, with performance details depending on a mix of user hints and opaque static analysis of potential concurrency conflicts within a design. In this paper we present Koika, a derivative of Bluespec that preserves its desirable properties and yet gives direct control over the scheduling decisions that determine performance. Koika has a novel and deterministic operational semantics that uses dynamic analysis to avoid concurrency anomalies. Our implementation includes Coq definitions of syntax, semantics, key metatheorems, and a verified compiler to circuits. We argue that most of the extra circuitry required for dynamic analysis can be eliminated by compile-time BSV-style static analysis.Defense Advanced Research Projects Agency (DARPA) (Grant CCF-1521584)National Science Foundation (Grant HR001118C0018

    Hydras & Co.: Formalized mathematics in Coq for inspiration and entertainment

    No full text
    International audienceHydras & Co. is a collaborative library of discrete mathematics for the Coq proof assistant, developed as part of the Coq-community organization on GitHub. The Coq code is accompanied by an electronic book, generated with the help of the Alectryon literate proving tool. We present the evolution of the mathematical contents of the library since former presentations at JFLA meetings. Then, we describe how the structure of the project is determined by two requirements which must be continuously satisfied. First, the Coq code needs to be compatible with its ever-evolving dependencies (the Coq proof assistant and several Coq packages both from inside and outside Coq-community) and reverse dependencies (Coq-community projects that depend on it). Second, the book needs to be consistent with the Coq code, which undergoes frequent changes to improve structure and include new material. We believe Hydras & Co. demonstrates that books on formalized mathematics are not limited to providing exposition of theories and reasoning techniquesthey can also provide inspiration and entertainment that transcends educational goals
    corecore